I still remember the blizzard in January 27, 2015, we were out of heating because the extreme cold froze the critical part of my furnace, and no one could come to repair in heavy snow. Finally, it turned out that winter was recorded as the worst winter in the history of Boston! Not only the winter, we were also shocked by the recent hot summers. I always felt that summer was a comfortable season in Boston before, but we were surprised to have the warmest August on record in 2018 and 2016. As we experienced extreme weathers in the past several years, we would like to know the trend of weather conditions in Boston and some other cities in US in recent years.
Several years ago, my Mom came to US in winter and our whole family planned to go to Grand Canyon and California for a vacation. We were so excited and planned this trip since my mother got the visa months ago. We first arrived in New York for transfer, but we all were stuck in JFK airport because of the heavy snow, the flight was delay for more than 10 hours, and we were extremely tired, especially my daughter. The bad luck didn’t end after we boarded on the plane. It started to snow in the afternoon when we were leaving Grand Canyon. Driving downhill on the newly falling slippery snow in the dark was really a challenge even though I had been driving in Minnesota and Wisconsin for years. My little daughter even got a high fever, and we had to cancel all the plans afterwards. I always remember this frustrated experience, so we would like to make some recommendations to help visitors to choose best times to visit certain place or best places to visit in certain month.
Data was downloaded from Kaggle. The dataset contains around 5 years of high temporal resolution (hourly measurements) data of various weather attributes, including temperature, humidity, wind speed, and weather description, from 30 US & Canadian Cities and 6 Israeli Cities.
Here, we only analyzed the following 10 US coastal cities, which are well-known tourism cities. (west coast: Seattle, Portland, San Francisco, Los Angeles, and San Diego; east coast: Boston, New York, Philadelphia, Jacksonville, Miami)
library(readr)
temp <- read_csv("/Users/hongyanli/Desktop/BST260project/weather_data/temperature.csv")
hum <- read_csv("/Users/hongyanli/Desktop/BST260project/weather_data/humidity.csv")
wind <- read_csv("/Users/hongyanli/Desktop/BST260project/weather_data/wind_speed.csv")
des <- read_csv("/Users/hongyanli/Desktop/BST260project/weather_data/weather_description.csv")
library(dplyr)
library(lubridate)
temp=temp %>%
mutate(datetime=ymd_hm(datetime)) %>%
mutate(year=year(datetime), month=month(datetime), day=day(datetime), hour=hour(datetime)) %>%
filter(!(year==2017 & month > 9))
hum=hum %>%
mutate(datetime=ymd_hm(datetime)) %>%
mutate(year=year(datetime), month=month(datetime), day=day(datetime), hour=hour(datetime)) %>%
filter(!(year==2017 & month > 9))
wind=wind %>%
mutate(datetime=ymd_hm(datetime)) %>%
mutate(year=year(datetime), month=month(datetime), day=day(datetime), hour=hour(datetime)) %>%
filter(!(year==2017 & month > 9))
des=des %>%
mutate(datetime=ymd_hm(datetime)) %>%
mutate(year=year(datetime), month=month(datetime), day=day(datetime), hour=hour(datetime)) %>%
filter(!(year==2017 & month > 9))
# convert temperature from Kelvin to Celsius
datetime=temp[,1]
temp=cbind(datetime, temp[,c(2:28)]-273.15, temp[,c(29:32)])
temp$term[temp$year==2012 | (temp$year==2013 & temp$month<=9)]="2012.10-2013.09"
temp$term[(temp$year==2013 & temp$month>=10) | (temp$year==2014 & temp$month<=9)]="2013.10-2014.09"
temp$term[(temp$year==2014 & temp$month>=10) | (temp$year==2015 & temp$month<=9)]="2014.10-2015.09"
temp$term[(temp$year==2015 & temp$month>=10) | (temp$year==2016 & temp$month<=9)]="2015.10-2016.09"
temp$term[(temp$year==2016 & temp$month>=10) | (temp$year==2017 & temp$month<=9)]="2016.10-2017.09"
temp$term2[temp$term=="2012.10-2013.09"]="1"
temp$term2[temp$term=="2013.10-2014.09"]="2"
temp$term2[temp$term=="2014.10-2015.09"]="3"
temp$term2[temp$term=="2015.10-2016.09"]="4"
temp$term2[temp$term=="2016.10-2017.09"]="5"
des$term[des$year==2012 | (des$year==2013 & des$month<=9)]="2012.10-2013.09"
des$term[(des$year==2013 & des$month>=10) | (des$year==2014 & des$month<=9)]="2013.10-2014.09"
des$term[(des$year==2014 & des$month>=10) | (des$year==2015 & des$month<=9)]="2014.10-2015.09"
des$term[(des$year==2015 & des$month>=10) | (des$year==2016 & des$month<=9)]="2015.10-2016.09"
des$term[(des$year==2016 & des$month>=10) | (des$year==2017 & des$month<=9)]="2016.10-2017.09"
des$term2[des$term=="2012.10-2013.09"]="1"
des$term2[des$term=="2013.10-2014.09"]="2"
des$term2[des$term=="2014.10-2015.09"]="3"
des$term2[des$term=="2015.10-2016.09"]="4"
des$term2[des$term=="2016.10-2017.09"]="5"
hum$term[hum$year==2012 | (hum$year==2013 & hum$month<=9)]="2012.10-2013.09"
hum$term[(hum$year==2013 & hum$month>=10) | (hum$year==2014 & hum$month<=9)]="2013.10-2014.09"
hum$term[(hum$year==2014 & hum$month>=10) | (hum$year==2015 & hum$month<=9)]="2014.10-2015.09"
hum$term[(hum$year==2015 & hum$month>=10) | (hum$year==2016 & hum$month<=9)]="2015.10-2016.09"
hum$term[(hum$year==2016 & hum$month>=10) | (hum$year==2017 & hum$month<=9)]="2016.10-2017.09"
hum$term2[hum$term=="2012.10-2013.09"]="1"
hum$term2[hum$term=="2013.10-2014.09"]="2"
hum$term2[hum$term=="2014.10-2015.09"]="3"
hum$term2[hum$term=="2015.10-2016.09"]="4"
hum$term2[hum$term=="2016.10-2017.09"]="5"
wind$term[wind$year==2012 | (wind$year==2013 & wind$month<=9)]="2012.10-2013.09"
wind$term[(wind$year==2013 & wind$month>=10) | (wind$year==2014 & wind$month<=9)]="2013.10-2014.09"
wind$term[(wind$year==2014 & wind$month>=10) | (wind$year==2015 & wind$month<=9)]="2014.10-2015.09"
wind$term[(wind$year==2015 & wind$month>=10) | (wind$year==2016 & wind$month<=9)]="2015.10-2016.09"
wind$term[(wind$year==2016 & wind$month>=10) | (wind$year==2017 & wind$month<=9)]="2016.10-2017.09"
wind$term2[wind$term=="2012.10-2013.09"]="1"
wind$term2[wind$term=="2013.10-2014.09"]="2"
wind$term2[wind$term=="2014.10-2015.09"]="3"
wind$term2[wind$term=="2015.10-2016.09"]="4"
wind$term2[wind$term=="2016.10-2017.09"]="5"
temp=temp %>%
mutate(date=ymd(paste(year, month, day, sep="-"))) %>%
select(date, term, term2, year, month, day, hour, Seattle, Portland, `San Francisco`, `Los Angeles`, `San Diego`, Boston, `New York`, Philadelphia, Jacksonville, Miami)
des=des %>%
mutate(date=ymd(paste(year, month, day, sep="-"))) %>%
select(date, term, term2, year, month, day, hour, Seattle, Portland, `San Francisco`, `Los Angeles`, `San Diego`, Boston, `New York`, Philadelphia, Jacksonville, Miami)
hum=hum %>%
mutate(date=ymd(paste(year, month, day, sep="-"))) %>%
select(date, term, term2, year, month, day, hour, Seattle, Portland, `San Francisco`, `Los Angeles`, `San Diego`, Boston, `New York`, Philadelphia, Jacksonville, Miami)
wind=wind %>%
mutate(date=ymd(paste(year, month, day, sep="-"))) %>%
select(date, term, term2, year, month, day, hour, Seattle, Portland, `San Francisco`, `Los Angeles`, `San Diego`, Boston, `New York`, Philadelphia, Jacksonville, Miami)
if (!require("weathermetrics")) {
install.packages("weathermetrics")
library("weathermetrics")
}
temp=as.data.frame(temp)
hum=as.data.frame(hum)
heat=temp
for(i in c(8:17)){
heat[,i]=heat.index(t=temp[,i], rh=hum[,i], temperature.metric = "celsius", output.metric = "celsius", round = 1)
}
if (!require("ThermIndex")) {
install.packages("ThermIndex")
library("ThermIndex")
}
wind=as.data.frame(wind)
chill=temp
for(i in c(8:17)){
chill[,i]=wc(temp[,i], wind[,i])
}
Apparent temperature, is derived from either a combination of temperature and wind wind chill or temperature and humidity heat index for the indicated hour. When the temperature is lower than 10\(^0C\) (50\(^0F\)) and the wind speed is higher than 1.34 m/s (3 mph), wind chill was used for that point for the Apparent Temperature. When the temperature is above 26.7\(^0C\) (80\(^0F\)), the heat index will be used for apparent temperature. Otherwise, the apparent temperature will be the ambient air temperature.
app=temp
for(i in c(1:43812)){
for (j in c(8:17)){
app[i,j]=ifelse(is.na(app[i,j]), NA, ifelse(app[i,j]>26.7, heat[i,j], ifelse(app[i,j]<10 & wind[i,j]>1.34, chill[i,j], temp[i,j])))
}
}
a=heat %>% summarize_all(funs(sum(is.na(.)) / length(.)))
a=data.frame(t(a))
colnames(a)="heat_NA"
b=chill %>% summarize_all(funs(sum(is.na(.)) / length(.)))
b=data.frame(t(b))
colnames(b)="chill_NA"
c=temp %>% summarize_all(funs(sum(is.na(.)) / length(.)))
c=data.frame(t(c))
colnames(c)="temp_NA"
d=des %>% summarize_all(funs(sum(is.na(.)) / length(.)))
d=data.frame(t(d))
colnames(d)="des_NA"
e=app %>% summarize_all(funs(sum(is.na(.)) / length(.)))
e=data.frame(t(e))
colnames(e)="app_NA"
f=hum %>% summarize_all(funs(sum(is.na(.)) / length(.)))
f=data.frame(t(f))
colnames(f)="hum_NA"
g=wind %>% summarize_all(funs(sum(is.na(.)) / length(.)))
g=data.frame(t(g))
colnames(g)="wind_NA"
NA_prop=cbind(a,b,c,d,e,f,g)*100
NA_prop=NA_prop[c(8:17),]
library(tidyverse)
library(ggplot2)
library(gridExtra)
if (!require("grid")) {
install.packages("grid")
library("grid")
}
plot=list()
for(i in c(8:17)){
b=names(temp[i])
a=cbind(temp[,1:7], temp=temp[,i])
names(a)[8]="temp"
aa=a %>%
group_by(year, month, day) %>%
summarize(highest=max(temp, na.rm=TRUE), average=mean(temp, na.rm=TRUE), lowest=min(temp, na.rm=TRUE))
aaa=left_join(aa,(a %>% select(year, month, day, term, term2, date)),by=c("year", "month", "day"))
aaa=aaa %>%
filter(!duplicated(date)) %>%
gather(type, temp, c(`highest`, `average`, `lowest`))
trendhigh=lm(temp~time(temp), data=aaa[aaa$type=="highest",])
trendavg=lm(temp~time(temp), data=aaa[aaa$type=="average",])
trendlow=lm(temp~time(temp), data=aaa[aaa$type=="lowest",])
aaa$trend=c(fitted(trendhigh), fitted(trendavg), fitted(trendlow))
p=ggplot(aaa, aes(date, temp, color=term)) +
geom_point(alpha=1) +
geom_line(aes(date, trend), color="black") +
scale_y_continuous("daily temperature (??)", breaks=c(-20,-10,0,10,20,30,40)) +
xlab("") +
ggtitle(b)+
facet_grid(. ~ type) +
theme(title = element_text(colour="black",size=13),
axis.text = element_text(colour="grey20",size=11),
axis.title = element_text(colour="black",size=12),
legend.text = element_text(colour="black",size=12),
legend.title = element_text(colour="black",size=12),
strip.text.x = element_text(colour="black",size=12)) +
theme(plot.title = element_text(hjust = 0.5)) +
theme(legend.position = "none")
plot[[i-7]]=p
}
grid.arrange(plot[[1]], plot[[2]], plot[[3]], plot[[4]], plot[[5]], plot[[6]], plot[[7]], plot[[8]], plot[[9]], plot[[10]], ncol=2, nrow=5)
We organized all the temperatures in average, highest, and lowest of each day in the past fives years. The scatterplots for hte 10 cities are shown below, and we could visually see that the temperature are roughly following an increasing trend over the past five years, especially in west coast cities like Portland, San Francisco, Los Angeles, and San Diego, and east coast cities such as Jacksonville and Miami.
if (!require("zoo")) {
install.packages("zoo")
library("zoo")
}
plot=list()
grid_arrange_shared_legend <- function(..., ncol = length(list(...)), nrow = 1, position = c("bottom", "right")) {
plots <- list(...)
position <- match.arg(position)
g <- ggplotGrob(plots[[1]] +
theme(legend.position = position))$grobs
legend <- g[[which(sapply(g, function(x) x$name) == "guide-box")]]
lheight <- sum(legend$height)
lwidth <- sum(legend$width)
gl <- lapply(plots, function(x) x +
theme(legend.position = "none"))
gl <- c(gl, ncol = ncol, nrow = nrow)
combined <- switch(position,
"bottom" = arrangeGrob(do.call(arrangeGrob, gl),
legend,ncol = 1,
heights = unit.c(unit(1, "npc") - lheight, lheight)),
"right" = arrangeGrob(do.call(arrangeGrob, gl),
legend, ncol = 2,
widths = unit.c(unit(1, "npc") - lwidth, lwidth)))
grid.newpage()
grid.draw(combined)
invisible(combined)
}
for(i in c(8:17)){
cityname=names(temp[i])
temp_city=cbind(temp[,1:7], city=temp[,i])
names(temp_city)[8]="temp"
app_city=cbind(app[,1:7], city=app[,i])
names(app_city)[8]="app"
temp_city <- temp_city %>%
group_by(year, month, day) %>%
summarise(highest=max(temp, na.rm=TRUE), lowest=min(temp, na.rm=TRUE)) %>%
ungroup() %>%
group_by(year, month) %>%
summarise(highest_air=mean(highest), lowest_air=mean(lowest)) %>%
ungroup()
app_city <- app_city %>%
group_by(year, month, day) %>%
summarise(highest=max(app, na.rm=TRUE), lowest=min(app, na.rm=TRUE)) %>%
ungroup() %>%
group_by(year, month) %>%
summarise(highest_apparent=mean(highest), lowest_apparent=mean(lowest)) %>%
ungroup() %>%
left_join(temp_city, by=c("year","month"))
app_city=app_city %>% gather(type, temperature, -month, -year)
mon=paste(app_city$year, app_city$month, sep="-")
mon=as.yearmon(mon)
app_city=cbind(app_city, mon=mon)
p <- app_city %>%
ggplot(aes(mon, temperature, col=type, group=type)) +
geom_line(size=0.8) +
xlab("") +
scale_y_continuous("daily temperature (??)", breaks=c(-20,-10,0,10,20,30,40)) +
ggtitle(cityname) +
theme(title = element_text(colour="black",size=15),
axis.text = element_text(colour="grey20",size=13),
axis.title = element_text(colour="black",size=14),
legend.text = element_text(colour="black",size=15),
legend.title = element_text(colour="black",size=15)) +
theme(plot.title = element_text(hjust = 0.5))
plot[[i-7]]=p
}
grid_arrange_shared_legend(plot[[1]], plot[[2]], plot[[3]], plot[[4]], plot[[5]], plot[[6]], plot[[7]], plot[[8]], plot[[9]], plot[[10]], ncol=3, nrow=4, position="right")
We plotted the daily highest and lowest air temperature comparing to the corresponding apparent temperatures of the 10 US cities in the past five years. A clear finding here is that the difference between the daily highest apparent temperature and air temperature is becoming larger in summer in the recent three years in San Francisco, Los Angeles, and Boston, compared to the earlier 2 years. This means people in these cities were feeling hotter than before.
term=list()
for(i in c(8:17)){
term[[i-7]]=levels(as.factor(as.data.frame(des)[,i]))
}
term=unlist(term)
term=as.data.frame(term)
names(term)=c("term")
term=term %>%
filter(!duplicated(term))
term
## term
## 1 broken clouds
## 2 drizzle
## 3 few clouds
## 4 fog
## 5 haze
## 6 heavy intensity drizzle
## 7 heavy intensity rain
## 8 heavy intensity shower rain
## 9 heavy snow
## 10 light intensity drizzle
## 11 light intensity shower rain
## 12 light rain
## 13 light shower snow
## 14 light snow
## 15 mist
## 16 moderate rain
## 17 overcast clouds
## 18 proximity thunderstorm
## 19 scattered clouds
## 20 shower rain
## 21 sky is clear
## 22 smoke
## 23 snow
## 24 squalls
## 25 thunderstorm
## 26 thunderstorm with heavy rain
## 27 thunderstorm with light rain
## 28 thunderstorm with rain
## 29 very heavy rain
## 30 dust
## 31 freezing rain
## 32 sleet
## 33 proximity shower rain
## 34 proximity thunderstorm with rain
## 35 light rain and snow
## 36 heavy thunderstorm
## 37 proximity thunderstorm with drizzle
## 38 sand
## 39 sand/dust whirls
## 40 thunderstorm with light drizzle
## 41 volcanic ash
## 42 tornado
plot=list()
color=c("sunny" ='#fb61d7',"cloudy" ='#00b6eb', "fog" ='#53b400',"haze" ='#c49a00', "rainy" ='#f8766d',"snow" ='#00c094', "storm" ='#a58aff')
for(i in c(8:17)){
b=names(des[i])
a=cbind(des[,1:7], d=des[,i])
names(a)[8]="d"
a=a %>%
filter(!is.na(d)) %>%
mutate(des=ifelse(d %in% c("broken clouds", "scattered clouds", "overcast clouds"), "cloudy", ifelse(d %in% c("fog", "mist"), "fog", ifelse(d %in% c("dust", "haze", "sand", "smoke", "sand/dust whirls", "volcanic ash"), "haze", ifelse(d %in% c("heavy snow", "snow", "light snow", "light shower snow"), "snow", ifelse(d %in% c("squalls", "thunderstorm", "proximity thunderstorm", "heavy thunderstorm", "tornado"), "storm", ifelse(d %in% c("few clouds", "sky is clear"), "sunny", "rainy"))))))) %>%
group_by(term, des) %>%
mutate(count=n()) %>%
filter(!duplicated(count)) %>%
ungroup()
p=a %>%
mutate(des = reorder(des, count, FUN=mean)) %>%
ggplot(aes(fill=des, y=count, x=term)) +
geom_bar(stat="identity", position="fill", width=0.5) +
ggtitle(b) +
scale_y_continuous(labels = scales::percent) +
xlab("") +
ylab("Proportion") +
theme(title = element_text(colour="black",size=15),
axis.text = element_text(colour="grey20",size=13),
axis.title = element_text(colour="black",size=14),
legend.text = element_text(colour="black",size=15),
legend.title = element_text(colour="black",size=15)) +
theme(plot.title = element_text(hjust = 0.5)) +
theme(axis.text.x = element_text(angle = 15, hjust = 0.5, vjust = 0.6)) +
scale_fill_manual(values=color)
plot[[i-7]]=p
}
grid_arrange_shared_legend(plot[[1]], plot[[2]], plot[[3]], plot[[4]], plot[[5]], plot[[6]], plot[[7]], plot[[8]], plot[[9]], plot[[10]], ncol=3, nrow=4, position="right")
We plotted the daily highest and lowest air temperature comparing to the corresponding apparent temperatures of the 10 US cities in the past five years. A clear finding here is that the difference between the daily highest apparent temperature and air temperature is becoming larger in summer in the recent three years in San Francisco, Los Angeles, and Boston, compared to the earlier 2 years. This means people in these cities were feeling hotter than before.
The criteria for nice days are referenced form here
High temperature between 65 and 85\(^0F\) (18.33-29.44\(^0C\))
Maximum dew point temperature less than or equal to 65\(^0F\) (18.33\(^0C\))
Peak daily wind (including gusts) less than 25 mph (11.176 m/s)
Average daily cloud cover less than or equal to 65 percent
No measurable precipitation
Dew point temperature is calculated using temperature and humidity. Cloud coverage is assigned to each description referencing from here.
dew=hum
for(i in c(1:43812)){
for (j in c(8:17)){
dew[i,j]=ifelse(is.na(hum[i,j]), NA, humidity.to.dewpoint(rh=hum[i,j],t=temp[i,j], temperature.metric = "celsius"))
}
}
highestfun=function(x){
a=max(x, na.rm=TRUE)
a
}
cc=function(x){
c=ifelse(is.na(x), NA, ifelse(x=="broken clouds", 75, ifelse(x=="scattered clouds", 44, ifelse(x=="few clouds", 12.5, ifelse(x=="sky is clear", 0, 100)))))
c
}
averagefun=function(x){
a=mean(x, na.rm=TRUE)
a
}
rain=function(x){
a=ifelse(is.na(x), NA, ifelse(x %in% c("broken clouds", "scattered clouds", "overcast clouds", "fog", "mist", "dust", "haze", "sand", "smoke", "sand/dust whirls", "volcanic ash", "squalls", "thunderstorm", "proximity thunderstorm", "heavy thunderstorm", "tornado", "few clouds", "sky is clear"),0,1))
a
}
sumfun=function(x){
a=sum(x, na.rm=TRUE)
a
}
nice_t=temp[,c(4:6, 8:17)] %>%
group_by(year, month, day) %>%
summarise_all(funs(high=highestfun))
nice_d=dew[,c(4:6, 8:17)] %>%
group_by(year, month, day) %>%
summarise_all(funs(high=highestfun))
nice_w=wind[,c(4:6, 8:17)] %>%
group_by(year, month, day) %>%
summarise_all(funs(high=highestfun))
cloud=des[,c(4:17)] %>%
group_by(year, month, day, hour) %>%
summarise_all(funs(c=cc))
nice_c=cloud[,c(1:3, 5:14)] %>%
group_by(year, month, day) %>%
summarise_all(funs(avg=averagefun))
rain=des[,c(4:17)] %>%
group_by(year, month, day, hour) %>%
summarise_all(funs(r=rain))
nice_r=rain[,c(1:3, 5:14)] %>%
group_by(year, month, day) %>%
summarise_all(funs(sum=sumfun))
niceday=nice_r
for(i in 1:1826){
for(j in 4:13){
niceday[i,j]=ifelse((nice_t[i,j]<29.44 & nice_t[i,j]>18.33 & nice_d[i,j]<=18.33 & nice_r[i,j]==0 & nice_w[i,j]<11.176 & nice_c[i,j]<=65),1,0)
}
}
niceday$term[niceday$year==2012 | (niceday$year==2013 & niceday$month<=9)]="2012.10-2013.09"
niceday$term[(niceday$year==2013 & niceday$month>=10) | (niceday$year==2014 & niceday$month<=9)]="2013.10-2014.09"
niceday$term[(niceday$year==2014 & niceday$month>=10) | (niceday$year==2015 & niceday$month<=9)]="2014.10-2015.09"
niceday$term[(niceday$year==2015 & niceday$month>=10) | (niceday$year==2016 & niceday$month<=9)]="2015.10-2016.09"
niceday$term[(niceday$year==2016 & niceday$month>=10) | (niceday$year==2017 & niceday$month<=9)]="2016.10-2017.09"
nicedayresult=niceday %>% group_by(term) %>% summarise_all(funs(sum=sumfun))
nicedayresult=nicedayresult[,5:14]
names(nicedayresult)=c("Seattle", "Portland", "San Francisco", "Los Angeles", "San Diego", "Miami", "Jacksonville", "Philadelphia", "New York", "Boston")
nd=data.frame(apply(nicedayresult,2,mean))
nd$city=row.names(nd)
names(nd)=c("niceday", "city")
p=ggplot(nd, aes(x=reorder(city, niceday), y=niceday)) +
geom_bar(stat='identity', colour="dodgerblue", fill="dodgerblue", width=0.6) +
geom_text(aes(label=niceday), position=position_dodge(width=0.6), hjust=-0.3, vjust=0.3, color="grey20", size=4) +
coord_flip() +
xlab("") +
ylab("") +
scale_y_continuous(limits=c(0, 150)) +
ggtitle("Average Number of Nice Days in a Year") +
theme(title = element_text(colour="black",size=13),
axis.text = element_text(colour="grey20",size=12),
axis.title = element_text(colour="black",size=12),
legend.text = element_text(colour="black",size=12),
legend.title = element_text(colour="black",size=12)) +
theme(plot.title = element_text(hjust = 0.5))
p
Ranked as 1st, Los Angeles has the most nice days in a year among the 10 US cities, and San Diego follows. Miami, which is super hot and humid, has the fewest number of nice days, may not be a good choice for long time living.
The weather score takes both apparent temperature and sunny time into consideration, with a scale of 20 (10 for each). A higher weather score means more suitable for traveling.
Temperature Score (referenced from here): For each hour between 8:00 AM and 6:00 PM of each day in the analysis period (2012.10-2017.09), independent scores are computed for apparent temperature. Those hourly scores are then aggregated into days, averaged over all the five years. Our temperature score is 0 for apparent temperatures below 10 °C (50°F), rising linearly to 9 for 18.3°C (65°F), to 10 for 23.9 °C (75°F), falling linearly to 9 for 26.7 °C (80°F), and to 0 for 32.2 °C (90°F) or hotter.
Functions for the temperature score against temeprature:
10-18.3\(^0C\): y=1.0843*x-10.8434
18.3-23.9\(^0C\): y=0.1786*x+5.7316
23.9-26.7\(^0C\): y=18.5347-0.3571*x
26.7-32.2\(^0C\): y=52.6921-1.6364*x
# temperature score
score_app=app %>%
filter(hour >=8 & hour <=18) %>%
gather(city, app, c(`Seattle`, `Portland`, `San Francisco`, `Los Angeles`, `San Diego`, `Boston`, `New York`, `Philadelphia`, `Jacksonville`, `Miami`)) %>%
mutate(score=case_when(app < 10 | app >=32.2 ~ 0,
app >= 10 & app < 18.3 ~ (1.0843*app-10.8434),
app >= 18.3 & app < 23.9 ~ (0.1786*app+5.7316),
app >=23.9 & app < 26.7 ~ (18.5347-0.3571*app),
app >=26.7 & app < 32.2 ~ (52.6921-1.6364*app),
TRUE ~ NA_real_))
score_app=score_app %>%
group_by(month, city) %>%
mutate(score_tem=mean(score, na.rm=TRUE)) %>%
filter(!duplicated(score_tem)) %>%
select(month, city, score_tem)
Sunny Score (defined by ourselves): The proportion of sunny time between 8:00 AM and 6:00 PM is calculated for each month, and averaged over the past five years. All the sunny time proportions for each city and in each month (10*12=120 120 proportions) were ranked from the lowest to the highest (the rank of the highest proportion is 120, and the lowest is 1). Then our sunny score for a specific city and specific month is its rank divided by 12 (highest proportion has a score of 10, and the lowest 0.1).
# sunny score
score_des=des %>%
filter(hour >=8 & hour <=18) %>%
gather(city, des, c(`Seattle`, `Portland`, `San Francisco`, `Los Angeles`, `San Diego`, `Boston`, `New York`, `Philadelphia`, `Jacksonville`, `Miami`)) %>%
filter(!is.na(des)) %>%
mutate(wea=ifelse(des %in% c("broken clouds", "scattered clouds", "overcast clouds"), "cloudy", ifelse(des %in% c("fog", "mist"), "fog", ifelse(des %in% c("dust", "haze", "sand", "smoke", "sand/dust whirls", "volcanic ash"), "haze", ifelse(des %in% c("heavy snow", "snow", "light snow", "light shower snow"), "snow", ifelse(des %in% c("squalls", "thunderstorm", "proximity thunderstorm", "heavy thunderstorm", "tornado"), "storm", ifelse(des %in% c("few clouds", "sky is clear"), "sunny", "rainy"))))))) %>%
group_by(year, month, city, wea) %>%
mutate(n=n()) %>%
filter(!duplicated(wea)) %>%
select(year, month, city, wea, n) %>%
group_by(year, month, city) %>%
mutate(prop=prop.table(n))
sunny_prop=score_des %>%
filter(wea=="sunny") %>%
ungroup() %>%
select(month, city, prop) %>%
group_by(month, city) %>%
mutate(prop=mean(prop)) %>%
filter(!duplicated(prop)) %>%
ungroup() %>%
mutate(score_sun=rank(prop)/12)
Weather Score = Temperature Score + Sunny Score
# total weather score
weather_score=left_join(score_app, sunny_prop, by=c("month", "city")) %>%
select(month, city, score_tem, score_sun) %>%
mutate(score_t=score_tem+score_sun) %>%
ungroup()
From all the analyses, we found that in the past five years (from October 2012 to September 2017), the temperatures in the 10 chosen US coastal cities (Seattle, Portland, San Francisco, Los Angeles, San Diego, Boston, New York, Philadelphia, Jacksonville, and Miami) roughly followed an upward trend. People in some cities (San Francisco, Los Angeles, and Boston) were feeling hotter during the summer in the recent 3 years than before. While the temperature went up, the sunny time in west coast cities dropped over years, and the fog and haze time increased.
From weather’s perspective, we calculated nice days and weather scores for these cities. Los Angeles turned out to have the most nice days and received the highest weather score, even in December and January. So we highly recommend to visit Los Angeles in this coming Christmas and New Year’s holiday if you haven’t been there before.